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and Composition Framework 

Pablo Rodriguez-Mier, Carlos Pedrinaci, Manuel Lama, and Manuel Mucientes 


Abstract —In this paper we present a theoretical analysis of graph-based service composition in terms of its dependency with service 
discovery. Driven by this analysis we define a composition framework by means of integration with fine-grained I/O service discovery 
that enables the generation of a graph-based composition which contains the set of services that are semantically relevant for an 
input-output request. The proposed framework also includes an optimal composition search algorithm to extract the best composition 
from the graph minimising the length and the number of services, and different graph optimisations to improve the scalability of the 
system. A practical implementation used for the empirical analysis is also provided. This analysis proves the scalability and flexibility of 
our proposal and provides insights on how integrated composition systems can be designed in order to achieve good performance in 
real scenarios for the Web. 

Index Terms —Semantic Web Services; Service Discovery; Service Composition Framework; Service Composition Performance. 
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1 Introduction 

ERVICE discovery and composition are in general 
complex tasks that require considerable effort, es¬ 
pecially when vast amounts of services are available. 
Service discovery solufions range from fhe inifial UDDI 
proposal fhaf relied on fhe S 5 mfacfic descripfion of ser¬ 
vices and a prefixed cafegorisafion Q, fo more advanced 
generic solufions able fo discover Web APIs and Web 
services across domains exploifing rich user-provided 
semanfic service descriptions Q. Similarly, a plethora 
of service composition solufions have been produced 
sparming from mere graphical supporf fo complefely 
aufomafed solufions Bofh discovery and com¬ 

position engines essenfially rely on fhe processing of 
service descriptions, which increasingly go beyond S 5 m- 
tactic representations to include the semantics of fhe 
service(s) fo enable more advanced compulations |j^, Q. 

An analysis of fhe service composition liferafure hi^- 
lighfs fhaf, regardless of fhe approach, a cenfral fask fhaf 
needs fo be frequenfly performed fhroughouf fhe com¬ 
position acfivify is fhe discovery of suifable services fo 
use. Whefher one looks af fully aufomafed composition 
engines based on Artificial Infelligence (AI) plarming 
fechniques ||^-p0[], or af more consfrained solufions 
fhat rely on pre-defined skelefal plans pd) , p^ , or at 
graph based approaches focused on semantic input- 
output parameter matching p^-[[20), service discovery 
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is a central activity that needs to be carried out at every 
main step during the generation of fhe composition. Yef, 
despife fhe sfrong dependency befween bofh acfivifies, 
research and developmenf in bofh areas has evolved for 
fhe mosf parf independenfly. 

On fhe one hand, service discovery has fradifionally 
been approached as a one-of acfivify fo be sporadically 
carried ouf by humans when looking for services. As a 
consequence fhe inferface exposed by discovery engines 
assumes fhaf requesfs are fully specified in ferms of 
a well-defined inferface and cafegorisafion. Moreover, 
response times of discovery engines are orders of magni- 
fude above whaf would be accepfable for a composition 
engine fhaf should if delegafe fhe fhousands discovery 
requesfs if needs fo issue af composifion fime pT] . These 
limifafions hamper fhe developmenf of fasf composifion 
sysfems where discovery and composifion are two fun- 
damenfal, inferrelafed acfivifies. 

On the other hand, partly due to the particularly 
demanding computational needs of composifion algo- 
rifhms, mosf composifion engines reimplemenf locally 
fheir own discovery mefhods insfead of infegrafing ex- 
isfing componenfs providing sfafe of fhe arf discovery 
algorithms. Additionally, this approach relies on the 
unnecessary and often unrealistic assumption that the 
entire set of services should be locally available fo 
fhe composifion engine. This assumpfion requires pre- 
imporfing all services locally which is only viable for 
fhose regisfries providing entire public dumps of fhe ser¬ 
vice descripfions fhey hold. Furfhermore, mosf compo¬ 
sifion engines do nof infroduce opfimisafion fechniques 
fo improve fhe scalabilify by identifying equivalent or 
dominant functionality that could appear when many 
differents service registries are involved in the composi¬ 
tion. This prevents the use of optimal search sfrafegies 
since fhe complexify usually grows exponentially wifh 
fhe number of services. 
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In order to tackle the previous problems, a compo¬ 
sition framework should consider the following char- 
acferisfics: 1) provide convenienf fine-grained discovery 
mechanisms fhaf could help fo discover services able 
fo consume or produce (a subsef of) cerfain fypes of 
dafa as usually required during composition; 2) improve 
fhe response time of service discovery fo process requesfs 
very fasf; 3) supporf fhe infegrafion of third party service 
registries as a key acfivify in fhe composifion phase; 4) 
incorporafe optimizations fo improve fhe scalabilify of fhe 
overall composifion process; and 5) find optimal service 
compositions by minimizing differenf criferia such as fhe 
number of services or fhe lengfh of fhe composifion fo 
avoid complex, unmanageable solutions. 

In fhis paper we presenf a graph-based framework 
focused on fhe semanfic inpuf-oufpuf paramefer mafch- 
ing of services' inferfaces fhaf efficienfly infegrafes fhe 
aufomafic service composifion and semanfic service dis¬ 
covery. The provided framework fakes info accounf all 
fhe characferisfics indicafed in fhe above paragraph. 
Nofably fhe main confribufions are: 

1) A formal framework fhaf presenfs a fheorefical 
analysis of graph-based service composifion in 
ferms of ifs dependency wifh a service discovery 
and we provide a fine-grained I/O discovery in- 
ferface which reduces fhe performance overhead 
wifhouf having fo assume fhe local availabilify and 
in-memory preloading of service regisfries. The 
framework also includes an optimal composifion 
search algorifhm fo exfracf fhe besf composifion 
from fhe graph minimising fhe lengfh and fhe num¬ 
ber of services, and differenf graph opfimisafions fo 
improve fhe scalabilify of fhe sysfem, which as far 
as we now are nof included in ofher frameworks. 

2) A reference implemenfafion of fhis formal frame¬ 
work based on fhe adapfafion of two inde- 
pendenfly developed componenfs, namely Com- 
posIT p2| and iServe Q, respectively in charge of 
service composifion and discovery. 

3) A defailed performance analysis of fhe infegrafed 
sysfem, highlighting bofh fhe unaccepfable per¬ 
formance achieved when using fhe fypical ouf of 
fhe box discovery implemenfafions, as well as fhe 
facf fhaf fop performance is achievable wifh fhe 
adequafe discovery granularify and corresponding 
indexing opfimisafions. 


The proposed framework is dafa-flow cenfric, focused 
on fhe semanfic I/O paramefer mafching of services' in¬ 
ferfaces and leaving aside precondifions and effecfs. This 
is essentially a pragmatic decision inline wifh fhe currenf 
fendency towards lighfweighf dafa-driven approaches. 
In facf, on fhe Web less fhan 5% of fhe semanfic Web 
services include precondifions and effecfs |23|. 

The resf of fhe paper is organized as follows. Sec. 
discusses fhe sfafe-of-fhe-arf. Sec. formalizes fhe web 
service composifion problem and Sec. framework fhaf 
defines fhe composifion in ferms of service discovery 


fasks. Sec. [^describes our reference implemenfafion. Sec. 
1^ explores fhe performance of fhe sysfem for differenf 
scenarios and finally Sec. gives some final remarks. 

2 Related Work 

Automatic composifion of Web services is sfill an open 
problem fhaf involves multiple research areas Q. Con- 
crefely, lofs of eftorfs have been devoted fo aufomafe fhe 
discovery and composifion using differenf approaches 
and fechniques p4| . However, mosf of fhe research in 
bofh areas has been evolved independenfly of each ofher, 
despife fhe significanf overlap between these interrelated 
tasks. This has lead to a lack of integrated approaches in 
the field that consider the performance and the scalabil¬ 
ity of the overall integrated system as well as the impact 
of the discovery in terms of response time during the 
automatic composition task. 

From the discovery side, most of the work has been 
focused on improving the retrieval performance (i.e., 
precision-recall curve) without much concern about the 
response time requirements and/or the interface re¬ 
quirements to provide an efficient fine-grained discov¬ 
ery granularity for automatic composition. However, 
the response time of the discovery systems is recently 
gaining significant interest. A recent service discovery 
competition | [2T) shows some of the newest advances in 
the automatic discovery field. Most relevant examples 
are OWLS-MX3 iSem 1.1 @ and XSSD 0. The 
main conclusions that can be drawn from this contest, 
from the perspective of service composition, are twofold: 
1) research efforts are focused on response time improve¬ 
ment via caching and indexing, yet still not sufficient 
for fast, automatic composition of services and 2) the 
interface exposed by discovery engines assumes that 
requests are fully specified in terms of a well-defined 
interface and categorisation, i.e., discovery systems ex¬ 
pect a precise description of the service in terms of 
inputs and outputs, and / or other characteristics such as 
preconditions and effects. However, these interfaces are 
not adequate for service composition, since one of the 
assumptions is that there is usually no single service that 
fully matches a request and therefore several services 
need to be combined instead. Indeed, during automatic 
composition, an exploratory search is usually required 
to guess which relevant services can be selected at each 
step. This requires to launch many partial requests (fine¬ 
grained queries), rather than fully specified requests, in 
order to locate relevant services that match some partial 
information available to the algorithm (e.g., services that 
consume some inputs and/or produce some outputs). 
Fine-grained requests are simpler and can be solved 
faster than complex, fully specified requests. Thus they 
are more suitable for automatic composition systems. 

From the composition side, most approaches can be 
categorized into: 1) classical AI planning approaches 
| [2^ , where the composition problem is translated into 
the planning domain and solved using general planners. 
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and 2) graph-based I/O driven approaches that build a 
graph with the services and their input / output seman¬ 
tic relations (generally ommiting the preconditions and 
effects), and apply graph search techniques to extract 
(usually optimal) service compositions from fhe graph. 
Relevanf approaches of fhe firsf group are Q, p0| . 
These approaches differ from our work in fhe 
sense fhaf fhey handle very expressive precondifions and 
effecfs fo generafe composifion plans buf: 1) fhe concepf 
of exfernal service regisfries is missing, services are 
assumed fo be locally available; 2) average response fime 
of fhese sysfems is usually high; and 3) opfimizafions fo 
reduce fhe number of services by idenfifying redundanf 
funcfionalify are nof considered. 

On fhe ofher hand, graph-based I/O approaches are 
gaining much affenfion since fhe Web Service Chal¬ 
lenge p0| . Some nofable works in fhis field are p4)- 
[ [20) . Concrefely, | }T4| , |15|, pO) are fhe fop-3 algorifhms of 
fhe WSC'08. Alfhough fhese approaches show generally 
good performance and low response fimes, 0 and 
[15 1 do nof find opfimal solufions and | [20) fails fo find 
solufions in large dafa sefs. Addifionally, none of fhese 
sysfems consider neifher fhe infegration wifh service 
regisfries nor fhe use of service opfimizafions fo deal 
wifh pofenfial scalabilify problems. 

From fhe poinf of view of fhe infegrafed frameworks, 
a very inferesfing approach was proposed by Kona ef 
al. in |13|. In fhis paper, fhe aufhors presenf an efficienf 
framework for Web service composifion fhaf supporfs 
semanfic Web service discovery. The composifion is gen- 
erafed by performing a forward chaining of operafors fo 
find a feasible composifion. The aufhors also evaluafed 
fhe sysfem wifh fhe dafasefs of fhe Web Service Chal¬ 
lenge 2006 and presenfed a defailed experimenfafion. 
Their resulfs demonsfrafe fhe capabilities and fhe good 
performance of fhis sysfem which, however, exhibifs 
some limfafions: 1) fhe nof ion of an exfernal service 
regisfry is missing, all fhe informafion required is pre- 
processed and loaded in fhe main memory, which is 
one of fhe main issues we sef ouf fo fackle wifh fhis 
work since if is ofherwise nof possible fo deal with large 
and/or distributed datasets; 2) the framework does nof 
confemplafe service opfimisafions fo remove redundant 
information and 3) the work does not perform an opfi¬ 
mal search fo minimise fhe cosf or fhe number of services 
of fhe composifion as all possible compositions wifh fhe 
shorfesf lengfh are capfured in fhe composifion graph 
which should be furfher processed fo exfracf fhe opfimal 
composifion. Similarly, in | [3T| , Lecue ef al. develop an 
infegrafed framework for d 5 mamic Web service composi¬ 
fion. The framework exploifs fhe semanfic inpuf-outpuf 
mafchmaking to discover relevant services and performs 
aufomafic composifion using a graph-based approach, 
faking info account functional and non-functional prop¬ 
erties. However, graph optimisations are not considered 
and the composition search is non-optimal, since the 
selection of fhe services is merely greedy-based. 

In |[32|, Da Silva ef al. presenf a framework fhaf 


effecfively supporfs bofh aufomafic semanfic discov¬ 
ery and composifion, among ofher relevanf phases of 
fhe composifion life-cycle, such as service publicafion 
and service selection, faking info accounf non-funcfional 
properties. One of fhe limifafions of fhe discovery phase 
is fhaf if does nof supporf fine-grained requesfs. On 
fhe ofher hand, fhe framework does nof include neifher 
opfimisations fo reduce graph size nor an opfimal search 
fo exfracf fhe besf composifion from fhe graph. 

In light of fhe above analysis, we propose a graph- 
based I/O framework fhaf overcomes all of fhe analyzed 
limifafions. In fhis framework fhe discovery is defined 
in ferms of a fine-grained I/O inferface which minimises 
fhe performance overhead between bofh composifion 
and discovery wifhouf having fo assume fhe local avail- 
abilify and in-memory preloading of service regisfries. 
The proposed framework also includes an opfimal com¬ 
posifion search algorifhm fo extracf fhe besf composifion 
from fhe graph minimising fhe lengfh and fhe number 
of services, and different graph optimisations to improve 
the scalability of fhe sysfem. 

3 Web Service Composition Problem 

Service composifion aims fo help consfrucf composife 
services fhat could fulfil a user requesf, e.g., booking 
an enfire holiday, when no known service can achieve 
such a requesf on ifs own. A core acfivify for creating 
service composifions is, indeed, fhe discovery of rele¬ 
vanf services. In fhis confexf, relevanf services are fhose 
fhaf could be invoked and confribufe fo obfaining an 
execufable composifion fhaf would fulfil fhe needs sef 
ouf by fhe clienf. We herein formalise fhe composifion 
problem in close relationship with discovery as a means 
to better study and approach the integration of discov¬ 
ery and composifion engines. The formalisation of fhe 
problem is dafa-flow centric, focussed on fhe semanfic 
inpuf-outpuf paramefer mafching of services' inferfaces. 

3.1 Semantic Web Service Discovery 

The semanfic Web service discovery problem consisfs of 
locafing appropriafe services from one or more service 
regisfries fhaf are relevanf fo an inpuf-oufpuf requesf. 

Definition 1: A Semanfic Web Service (SWS, hereaffer 
"service") can be defined as a fuple w = {/Uu,, S 

W where In^ is a sef of inpufs required fo invoke 
w, Out^ is fhe sef of oufpufs refurned by w affer ifs 
execufion, and W is fhe sef of all services available in 
fhe service regisfry. Each inpuf and outpuf is relafed fo 
a semanfic concepf from an ontology O {In^^Outyj C O). 

Semanfic inpufs and oufpufs can be used to discover 
relevant services as well as to compose the functional¬ 
ity of mulfiple services by mafching their inputs and 
outputs together. In order to measure the quality of 
fhe mafch, we need a mafchmaking mechanism fhaf 
exploifs fhe semanfic I/O informafion of fhe services. 
The differenf mafchmaking degrees fhaf are f 5 qTically 
confemplafed in fhe liferafure are | [33) : 
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• Exact (=): An output 0^1 € Outmi of a service 
wl matches an input 1^2 € ^nw 2 of a service w2 
wifh a degree of exacf mafch if bofh concepfs are 
equivalenf. 

• Plugin (C): An oufpuf € Out^i of a service wl 
mafches an inpuf G Inw 2 of a service w2 wifh 
a degree of plugin if 0^1 is a sub-concepf of iw 2 

{^wl — '^w2}‘ 

• Subsume (□): An oufpuf o^i € Outwi of a service 
Till mafches an inpuf 1^,2 € Inu )2 of a service w2 wifh 
a degree of subsume if o^i is a super-concepf of 1^2 
iPwl 3 'i"w2)‘ 

• Fail (_L): When none of fhe previous mafches are 
found, fhen bofh concepfs are incompatible and fhe 
mafch has a degree of fail ( 0^,1 _L iw 2 )- 

Nofe fhaf, in order fo discover relevanf services fo 
generafe dafa-flow compafible service composifions, fhe 
only two valid degrees of mafch are exact and plugin. 
On fhis basis, we define fhe cmatch (compafible mafch) 
function fhaf will be used fo discover candidafe services 
during fhe composifion phase: 

Definition 2: Given a,b G O, a compafible mafch 
cmatch(a,b) holds if and only if a = 5 (exacf mafch) or 
at-b (plug-in mafch). 

Using fhe previous compafible mafch function be- 
fween concepfs, we can define fhe mafchmaking oper- 
afor "O" fhaf given fwo sefs of concepfs Ci,C 2 Q O, if 
refums fhe concepfs from C 2 mafched by Ci. 

Definition 3; Given Gi,G 2 C O, we define "O : O x 
O —>■ O" such fhaf Ci^C 2 = {c 2 € G 2 |cmofc/i(ci, C2), ci G 
Gi}. Nofe fhaf fhis operator is nof commufafive. 

We can use fhe previous operator fo define fhe con¬ 
cepfs of full and partial mafching befween concepfs. 

Definition 4: Given Gi, G 2 C O, a full mafching be¬ 
fween Cl and C 2 exisfs if Ci(^C 2 = C 2 , whereas a parfial 
mafching exisfs if Gi O G 2 C G 2 . 

Typically, a service w = {Inyj,Out^} is relevanf fo 
a requesf r — {Irir.Outr}, where In^ Q O are fhe 
provided inpufs and Outr C O fhe expecfed oufpufs, 
if lur ® Iriw = Iriw and Outw <8 Outr = Outr, fhaf is, 
fhere is a full mafch befween fhe provided inpufs and 
fhe service inpufs and a full mafch befween fhe service 
oufpufs and fhe expecfed oufpufs. 

While fhis approach is reasonable for discovering fhe 
services fhaf besf mafch an entire requesf (full mafch), 
for composifion one needs fo locate services fhaf are 
relevanf, fhaf is, fhaf mafch some inpufs / oufpufs (par¬ 
fial mafch). Thus, rafher fhan approaching fhe discov¬ 
ery problem based on a full inpuf/oufpuf description, 
we splif fhis problem info fwo finer-grained discovery 
problems fhaf are more relevanf for service composifion: 
inpuf discovery and oufpuf discovery. 

Definition 5: Given a sef of concepfs C Q O, fhe 
inpuf discovery problem can be defined as finding a 
sef of relevanf services W = {wi,...,w„} where Wi = 
{In.uj.,Outyjfi such fhaf Vwi G W, C (§) Iriy,. C Iriy,., fhaf 
is, services fhaf can consume some (parfial mafch) of fhe 
inpufs or are direcfly invokable (full mafch) wifh G. 


Definition 6: Given a sef of concepfs C C O, fhe 
oufpuf discovery problem can be defined as finding a 
sef of relevanf services W = {wi,..., t(;„} where wi = 
{luroijOutwfi such fhaf Vwi G W, Outyj. 0 G C G, fhaf 
is, services fhaf produce some or all oufpufs. 

Based on fhese definitions, we infroduce fhe notion of 
inpuf and oufpuf relevance: 

Definition 7: A service w = {Inyj,Outyj}, where 
Inyj,Outw C O, is input-relevant for a sef of concepfs 
G C O if G 0 luy, 0, whereas fhe service w, is output¬ 
relevant for a sef of concepfs G C O if Outw 

3.2 Semantic Web Service Composition 

The semantic composition problem considered in this 
work is as follows: Given a request r = {Inr,Outr}, 
where Itir is a set of available semantic input concepts 
and Outr a set of requested semantic output concepts, 
we can define the problem of the automatic construction 
of a SWS composition as that of finding a composite 
Web service Wc = {Iny,^,Outy,^,P = {S',<}} such that 
lur ® = Iiiw^ (the composite service is invokable 

with the available inputs) and Outy,^ ® Outr = Outr (the 
composite service retrieves all the requested outputs). 
This service consists of a partially ordered set P (a binary 
relation "<" over a set of services S C W). This partial 
ordered set of services is esentially a Directed Acyclic 
Graph (DAG) which models the implicit execution order 
of the services driven by the input/output matches, 
where nodes of the DAG are services and the arcs are 
valid semantic matches. This t 5 Tpe of composition has 
many advantages: On one hand, mapping inputs and 
outputs to semantic concepts does allow to reason about 
data t 5 rpes to improve the matchmaking between service 
parameters, which leads to more possible semantically 
valid compositions. On the other hand, DAG represen¬ 
tation formally captures the nature of a composition 
where services may be executed in different orders, i.e., 
there are many different total (sequential) orderings of a 
composition that lead to the same result. Moreover, since 
our approach is data-flow centric, a DAG representation 
is simpler than a general (possible cyclic) graph as cycles 
do not produce new data t 5 q)es in the composition. 

However there are also some drawbacks. First, a DAG 
representation could impose some restrictions in the 
compositions that can be generated, i.e, due the absence 
of cycles, a service could not explicitly be invoked 
twice. Second, compositions at different semantic levels 
rather than just concept matchmaking would deffinitely 
improve the quality of the compositions by capturing 
more possible cases. Furthermore, using input concepts 
and output concepts to define a composition request is 
not user friendly. A better way to specify a request would 
be to define it with ke 5 rwords. This, nonetheless, could 
be achieved with a pre-processing step using automatic 
semantic annotation tools to translate the request from 
ke 5 rwords to semantic concepts. Formally, we define a 
valid composition as follows: 
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Fig. 1. Overview of the proposed approach. 


Definitions: Let r = {Inr^Outr} and let Wc = 
{Iriio^.Outw^, P = {£',<}} be a composite service for 
the request r, where P is a partial order over the set 
of services S' C of fhe composife service Wc- We say 
Wc is a valid composifion for requesf r if and only if, 
for any topological sorf T = {wi,W 2 , ■■■,wn} of P, where 
Wj = {Iricuj, Outwj} Vj G [1, fV]/ the following expression 
is safisfied: 

{Irir O = lUyjfj A ((/n^ U Outyjf) ® = In^f} 

A ... A {[Iric U Outcji U ... U Outw^) ® OuD = Outr). 

This definition implies that every service of fhe com¬ 
posifion musf be invokable fo obfain an invokable ser¬ 
vice composifion. We say fhaf a service w = {/Uu,, Oufu,} 
is invokable wifh a sef of concepfs C C Oif each required 
inpuf fu, G luw is semantically mafched by a sef of 
concepfs C. 

Definition 9: If C C O is fhe sef of available inpuf 
concepfs, fhen a service w = {In^,Outw} is invokable 
wifh C if C®Inw = In^, i-e., fhere exisfs a full mafching 
befween fhe available inpufs and fhe service inpufs. 

Nofe fhaf if a service w is invokable with a set of 
concepfs C, fhen if is also inpuf-relevant for fhe same 
sef of concepfs since invokable implies input-relevant, buf 
fhe inverse does nof hold (see Def. [^. Thai is, fhe sef of 
invokable services is included in fhe sef of fhe relevanf 
services. 

The reader should nofe fhaf we resfricf fhe definifion 
of a compafible mafch fo exact and plugin in order fo 
generate semantically complefe compositions. However, 
fhe framework also supporfs fhe use of ofher mafch 
degrees (e.g., subsume) by relaxing fhe "cmatch" op¬ 
erator, which in practice means obfaining pofenfially 
more matched (but semantically weaker) concepts and 
thus bigger composition graphs with more services and 
match relations that could be semantically incomplete. 
This is supported not only in theory, but also by the 
reference implemenfation presenfed in Sec. 

4 Composition Framework 

On fhe basis of fhe formal definifion of fhe problem, 
in fhis secfion we presenf a graph-based framework 
for aufomafic semantic Web service composifion. Fig. 


shows fhe overview of our approach wifh fhe differenf 
sfeps involved. The process is friggered by a composifion 
requesf fhaf specifies fhe user requiremenfs in ferms of 
inpufs and fhe expecfed oufpufs. This information is 
used in the composition graph generation phase to build 
a graph with all the relevant services and the semantic 
relations between their inputs and outputs. In order to 
find fhe relevant services, the composition graph phase 
is interleaved with the discovery phase. The discovery 
phase is responsible for refrieving fhe relevanf services 
given fhe dafa available af differenf sfages during fhe 
composition graph generation phase. The relafionships be¬ 
fween fhe inpufs and oufpufs of services are computed 
in fhe matchmaking phase, where fhe semanfic mafching 
degree befween inpufs and oufpufs is computed using 
a semantic reasoner. The service composition graph is 
eventually generated on the basis of fhe relevanf ser¬ 
vices and fhe I/O mafching informafion. This graph 
confains all possible service composifions fhat satisfy fhe 
composifion requesf, in addition fo a few ofhers fhat, 
although invokable, do not manage to entirely fulfil fhe 
requesf. The service composifion graph is fhen opfimised 
applying differenf techniques fo group and reduce fhe 
number of services and relafions. Nexf, an optimal search 
is performed over fhe graph fo find fhe optimal compo¬ 
sifion. This phase is inferleaved wifh a search optimisation 
phase fhat analyses and reduces the search space. Finally, 
the optimised composition workflow is refumed. 

In fhis secfion, we analyse each phase and we provide 
generic sfrafegies based on fhe problem description pre¬ 
senfed in fhe previous secfion. 

4.1 Semantic Matchmaking 

A fundamental functionality that needs to be available 
for generating compositions and even for discovering 
services, is the ability to analyse the compatibility be¬ 
tween different semantic t 5 q)es. This functionality, which 
we refer to as semantic matchmaking, is in charge of 
assessing the level of semantic compatibility between 
concepts, given an ontology (or set of ontologies). To do 
so, semantic matchmaking relies on semantic reasoning 
(notably subsumption reasoning) in order to be able to 
determine the relationships between the concepts (e.g.. 
Plugin match). This mechanism can be used for example. 
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to discover services that can consume or produce a con¬ 
crete input/output by finding semantically compatible 
types. Such a mechanism is also particularly relevant for 
generating fhe service composition graph wifh all fhe 
mafches between services inpufs and outpufs. 

The mafchmaking sysfem provides a match{Ci,C 2 ) 
function which represenfs fhe concrefe implemenfafion 
function of fhe ® operator defined in Def. The match 
function fries fo find a valid mafch befween fhe source 
concepts of Ci and fhe target concepts of C 2 calling fhe 
cmatch{ci,Cj) function (Def. for each pair (ci,Cj) of 
concepfs where ct G Ci and Cj G C 2 . The compafible 
mafch function is calculated using a semanfic reasoner 
fhaf refums fhe semanfic relafion befween two concepts. 
Then, it checks if the relation is considered a compatible 
match (i.e., exact or plugin). Each time a compatible match 
is found between Ci and Cj, Cj is added to a set of 
matched concepts and removed from € 2 - The reader 
should note that the goal here is not to find the best 
match for each element but rather to get all compatible 
matches for each target element. 

The best-case complexity (all C 2 concepts matched by 
the first element from Ci) is 0{m), whereas the worst- 
case complexity (no compatible matches at all) is 0{m-n) 
where n = \Ci\,m = \C 2 \- This implies that, in the worst 
case, for two sets of elements, there will be at most mxn 
calls to the cmatch function which is ultimately answered 
by the semantic reasoner. 

4.2 Semantic Service Discovery 

In order to generate service compositions, it is necessary 
to be able to discover appropriate services based on their 
interface. The goal of a t 5 rpical discovery system is to 
find atomic services that match entirely a description 
representing the ideal service sought, i.e., all the inputs 
and outputs are compatible. However, from the view¬ 
point of generating data-flow compatible compositions, 
rather than looking for entire matches, we need to find 
suitable combinations of services that combined would 
satisfy a request. In this scenario, the ability to find 
partially matching services very fast is paramount in 
order to enable exploring efficiently the many possible 
combinations of services that could lead to a suitable 
composition. Therefore, in a nutshell, the type of service 
discovery that is required for supporting service compo¬ 
sition is a more relaxed and finer-grain version of that 
typically provided by discovery engines whereby partial 
matches can be obtained in a very fast marmer. This can 
be achieved by defining a simple fine-grained interface 
that supports the discovery of services using only par¬ 
tial information (some/any available inputs, some/any 
expected outputs). Fig. shows the pseudocode of this 
simple interface to discover relevant services that can be 
used as a starting point to obtain semantic input/output 
relevant services, as defined in Def. |^in Sec. 

The discovery algorithm sequentially scans all services 
and calls the Match function of the Matchmaker to deter¬ 
mine if a service is relevant for an input (the service has 


1: function RelevantIO(C C O, W, type) 

2: relevantServ := {} 

3: for all Wi = , O™ ■ } G W do 

4: if type = In ihen 

5: if match{C, luii) then 

6: relevantServ := relevantServ U Wi 

7: end if 

8: else if type = Out then 

9: if match{Owi > tton 

10: relevantServ := relevantServ U Wi 

11: end if 

12: end if 

13: end for 

14: return relevantServ 

15: end function 

Fig. 2. Pseudocode to obtain input-reievant and output- 
reievant sets of services for a particuiar set of concepts 

at least one input compatible with the inputs provided) 
or for an output (the service has at least one output 
compatible with the outputs provided) depending on 
the Type selected. Therefore, the complexity of this t57pe 
of discovery is 0{w) where w = \W\ is the size of 
the service repository. This implies at most \W\ calls to 
Match in the worst-case scenario or 0{w ■ m ■ n) if we 
consider the complexity of the Match method assuming 
every service has at most m outputs and n inputs. 

4.3 Service Composition Graph Generation 

When the system receives a request, the Service Com¬ 
position Graph Generator computes a graph with all the 
semantic relations between the relevant services for the 
request. A request is basically a set of input concepts, 
which represent the initial set of available inputs, and 
a set of output concepts, which are the outputs that 
the composite service should return. The service composi¬ 
tion graph is basically a layered Directed Acyclic Graph 
(DAG), G = {V, E), where: 

• V = WGC is the set of vertices of the graph, where 
W is the set of services and C the set of concepts 
(inputs and outputs). 

. E = CWCWCG CC is the set of edges in the graph 
where: 

- CW C{(c, ic) \c,wGV/\cGCf\wGW} is the 
set of input edges, i.e., edges cormecting input 
concepts to their services. 

- WC C {(m, c) \ w,cGVf\wGWf\cGC} is 
the set of output edges, i.e., edges cormecting 
services with their output concepts. 

- CC C {(c, c') I c, c' S VAc,T G CAcmatch{c,c')} 
is the set of edges that represent a semantic 
match between concepts. 

This graph contains all the known services that could 
directly or indirectly be invoked given the provided 
inputs. The graph is divided into N layers, whereby 
each layer i has all those services whose inputs are 
matched by the outputs produced in previous layers 
and, therefore, are invokable at layer i. The graph is 
augmented with two layers, namely Lq and Tat+i. Lq 
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contains the dummy service wo = {O/j,0} whereas 
Lm+i contains the dummy service wi = {%tIb}. The 
first one is a service that provides as outputs the inputs 
of fhe requesf {In) and fhe lasf one has fhe goal outpufs 
{Or) as inpufs. An example of a graph for lR={BookTitle, 
BookAuthor, CreditCard, Email, Address} and OR={Price, 
Payment, BookingCode} is shown in Fig. 

LO LI 12 L3 L4 





Fig. 3. Composition graph exampie. 


The firsf sfep of fhe composifion graph consfrucfion 
is fhe calculafion of fhe relevanf services. These services 
can be easily calculafed forwards, layer by layer, using 
fhe discovery mechanism previously presenfed. Fig. 
shows an implemenfafion of fhe forward composifion 
graph generafion algorifhm for a requesf R. The algo- 
rifhm selecfs all fhose services from fhe sef of all avail¬ 
able services W fhaf are inpuf-relevanf for fhe available 
concepfs {availCon) in each layer using fhe relevantIO 
funcfion (L. 8). Then, for each input-relevant service, 
fhe algorifhm performs a mafch between fhe available 
concepfs and fhe umnafched inpufs of each service. 
All fhe inpufs fhaf are mafched are removed from fhe 
umnafched sef of inpufs for fhe currenf service. If fhere 
are no unmafched inpufs, fhen fhe service is invokable 
and fhus is eligible for fhe currenf layer. For example, 
fhe firsf eligible services for fhe requesf shown in Fig. 
1^ are fhe services in fhe layer LI, which correspond 
wifh fhe services whose inpufs are fully mafched by Ir 
(fhe sef of concepfs in LO). The second eligible services 
are fhose services (placed in L2) whose inpufs are fully 
mafched by fhe oufpufs of fhe previous layers, and so on. 
Nofe fhaf insfead of performing fhe invokabilify check 
by finding a full mafch befween C and fhe inpufs of 
each service, we save fhose inpufs of each service fhaf 
have been mafched before, and hence we only perform 
fhe mafch befween fhe new oufpufs generafed in fhe 
previous level {availCon) and fhe remaining unmafched 
inpufs of each service {Uset)- Hence, fhe unmafched 
inpufs Uset of each service decreases monofonically wifh 
each level (i.e., fhe unmafched inpufs of each service 
always decrease when a new mafch is found, and 
fhe effecf is propagafed af each layer). The complexify 
analysis for fhis algorifhm (neglecfing fhe opfimisafion 


1: function FWDGRAPH(i? = {Ir, Or}, W) 

2: C '.= Ir', i := 0; Lq := {ic/}; L := Lq 

3: unmatchedin := [ ]; availCon := Ir 

4: W' := W; 

5: repeat 

6: i := i + 1 

7: Li '.= 0; Wsslected = 0 

8: Wreievant '■= relevanti0{availCon, W, In) 

9: availCon := 0 

10. for all Wi — {Iwi , Osji } G ^^relevant 

11 : Uset '■= unmatchedln[wi\ 

12 : Mset '■= Match(availCon,Uset) 

13: unmatchedln[wi\ := Uset \ Mset 

14: if Mset = 0 A uii ^ L then 

15- ITselected “ ^^selected G Wi 

16: availCon := availCon U Owi 

17: end if 

18: end for 

19- Li .— Li U ^Vselected 

20 : W' '.= W \ Wseleeted 

21: C := C U availCon 

22: until {Match{C, Or) = Or) \/ Li = IJ) 

23: L := LU {reo} 

24: end function 

Fig. 4. Algorithm for forward graph generation. 

effect due to the propagation of the matched inputs for 
simplification purposes) is 0{l ■ w ■ m ■ n + I ■ ^ ■ m ■ n) 
which can be simplified to 0{l ■ m ■ The first 

part corresponds with the complexity of the calls to the 
relevantIO function which is invoked I times (one call 
per layer), whereas the second part corresponds with 
the complexity of the for loop to check the invokabilify 
of each input-relevant service. We can expect that only 
a small subset of the repository W is relevant for the 
availCon generated in the previous layer. Thus, each call 
to relevantIO function returns a small set of relevant 
services w/k where A: (/c ^ 1) is a reduction factor 
that depends on the number of relevant services for a 
given set of concepts. This k factor is different for each 
request and service registry. For example, if we assume 
fc = 100 for a given problem for a service registry of 
1,000 services, then it means that each invokation of 
relevantIO{availCon,W, In) will return only the 1% of 
the services of the repository {w/k = 10). Consider the 
following example of a composition over a repository 
with 1,000 services {w — 1,000), assuming that there 
are m — 5 new output concepts generated and n — 5 
unmatched concepts at each layer, the composition graph 
has 10 layers {I = 10) and in each layer the relevantIO 
function returns on average w/k = 10 services (that is, 
k = 100). The complexity in this example is 10 • 1000 • 5 • 5 
for the first part plus 10 • • 5 • 5 for the second part, 

which is « 2.5 • 10® calls to the matchmaking system to 
compute all the required matches at the concept level. 

4.3.1 Index-Based Optimisations 

Although these improvements can save search time, 
one of the bottlenecks of the graph generation is still 
the size of the repository w, which is usually some 
orders of magnitude bigger than the other parameters 
involved in the complexity. One effective way to reduce 
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the impact of the size of fhe repository is precalculafing 
and indexing fhe inpuf-relevant sef of services for each 
concepf of fhe ontology. The indexing of services can 
be done independenfly of any composifion requesf as if 
only depends on fhe informafion available, such as fhe 
services fhemselves and fhe onfologies. 

The consfrucfion of an inverfed index function fo re¬ 
cover inpuf-relevanf services or oufpuf-relevanf services 
can be done easily using fhe relevantIO function. The 
main idea behind fhe inverfed index is fo build a key- 
value hash map where fhe keys are fhe concepfs of fhe 
onfology and fhe values are fhose services fhaf are inpuf- 
relevanf (or oufpuf-relevant) for fhat concepf. This map 
allows fo discovery relevant services in constant time 
during the graph generation. 

We define a new function relevantIO' which is fhe 
cached-version of fhe original function. Insfead of com- 
pufing fhe relevance by using direcfly fhe mafchmaking 
system, if firsf checks if fhe concepf is cached in fhe 
inverfed index. If fhe concepf is in fhe index, fhen if 
is immediately refumed (consfanf time). If nof, fhe call 
is delegated fo fhe relevantIO funcfion. Assuming fhere 
is enough memory fo keep fhe entire index, fhe index 
allows fo provide relevanf services af 0(1) for each 
concepf during fhe forward graph generation. Thus, we 
reduce the complexity associated to the parameter w. 
Concretely, since we can obtain at constant time the 
input-relevant services for each concepf, fhe complexify 
of relevantIO{availCon,W, In) now depends only on 
fhe number of concepfs in availCon (one access fo fhe 
index per concepf). Having m = \availCon\ (number of 
new concepfs af each layer) fhe complexify using indexes 
is 0(1 ■ m + I ■ f ■ m ■ n), simplified fo 0{l ■ m(l -|- ^ ■ n)). 
The use of indexes fo discover relevanf services during 
fhe forward graph generafion has a high impacf on fhe 
global performance. Using fhe same example as before, 
with w = 1000, I = 10, m = 5, n = 5 and k — 100 we 
have 10-5(1-1- • 5) = 2.55 • 10^, 2 orders of magnitude 

tower than the non-indexed version. 

4.4 Graph-Based Optimisations 

Once the graph is generated, the next step is to apply 
different optimisations to reduce the graph size in order 
to improve the optimal composition search performance. 
This parf of fhe composifion is independenf of fhe 
discovery phase. All fhe informafion required fo search 
for fhe optimal composifion is in fhe graph, namely, 
fhe relevant services and the semantic relations between 
their inputs and outputs, so there is no need to com¬ 
municate with the discovery/matchmaking systems. We 
distinguish at least two different techniques | [22) , pi) : 
backward pruning and interface dominance. 

4.4.1 Backward pruning 

As explained earlier, the generation of fhe composifion 
graph wifh fhe relevanf services is done forwards, layer 
by layer. During fhis forward expansion of fhe graph, we 


are nof inferesfed in invoking services fhaf have no ex- 
plicif effecfs on fhe composifion, fhat is, services that are 
not contributing to the output goals. When the graph is 
completed and the goal outputs are reached, a backward 
pruning is performed fo remove all non-confribufing ser¬ 
vices. A non-confribufing service is essentially a service 
fhaf is nof confained in fhe transitive closure set of fhe 
output-relevant services. A service w' = {In^',Outi„'} 
is output-relevant for a service w = {Iny^,Outu,} if 
Outw' ® In^ f 0 (def. 0. Thus, fhe sef of all output¬ 
relevant services for a service w can be defined as: 

X{w) = {w' G W I Outui' ® Inw 0} (1) 

Recursively, we can define fhe sef of X‘^{w) = 

X{X(w)) as fhe sef of output-relevant services af fhe 
disfance two. Exfending fhis, fhe transitive closure of fhe 
output-relevant services can be defined as: 

^(w) = A(w)U A2 (w)UA3(ui)U--- (2) 

Therefore, we can say fhat all those services of fhe 
graph thaf are nof in fhe transitive closure of fhe output¬ 
relevant services X are nof confribufing fo fhe composi¬ 
fion goals, direcfly nor indirecfly, and can fherefore be 
removed from fhe graph. 

An example of fhis can be seen in Fig. Sfarfing from 
fhe lasf layer, we compute fhe fransifive closure of fhe 
service wo, which is a dummy service fhaf represenfs 
fhe goal outpufs. The oufpuf relevanf services for wq af 
disfance one are X{wo) = {we, wr, ws, wg}, since Out^,.® 
Iriyjo 7 ^ 0 and fhe same for wt, w% and wg. We calculafe 
now fhe oufpuf-relevanf services af disfance two, which 
is A(A(ri;o)) = 2t({w6, U 17 , li's, wg})- ^8, wg}) 

can be simply computed as the union of A(w6)U A(ri; 7 )U 
A(w 8)U^(wg) which is {wi,W 2 ,W 3 }. Repeating this, we 
finally have X = {we,wr,ws,wg} U {wijWg^w^} U {wi}, 
where wj is the dummy service ommited in Fig. that 
provides the input concepts of the request (concepts in 
Lq). Since W4,W5,ws f X, these services (w4=MoviesDB 
Service, w^=GeoLoc WS, ws=Zip Search) are not contribut¬ 
ing to the goals and can be removed from the graph. 

4.4.2 Interface Dominance 

Another strategy to reduce the graph size is to analyse 
the equivalence and dominance of some services over 
others in terms of the interface they offer. It is very fre¬ 
quent to find services from different providers that offer 
similar services with overlapping interfaces. In scenarios 
like this, it is easy to end up with large composition 
graphs that make very hard to find optimal compositions 
in reasonable time. One way to attack this problem is to 
analyse the interface dominance between services in order 
to find those that are equivalent or better than others in 
terms of the interface they provide. 

Definition 10: Given a concept in a composition graph 
G {c G G), we denote $(c) as a function that returns the 
set of output-relevant services for concept c: 

$(c) = {w = {In^,, Outw} G G I Oufu, ® {c} = {c}} (3) 
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For instance, ^{Payment) in Fig. is {wgjWg} since 
Outyj^ ® {Payment} = {Payment} and Outwg <8) 
{Payment} = {Payment}, that is, concept Payment is 
matched by an output from ws (PaymentlD) and for an 
oufpuf from wg (PayNum). 

Definition 11: A service Wi = {/Uu,.,Oufu,.} G G is 
input-equivalent {In^^ = In^.) wifh respecf fo a service 
Wj = {luwj , Outwj} G G in fhe composifion graph G if: 

U U (4) 

CiGlriw Cn^Iriu,- 

i. in ^ j ujj 

Thaf is, fhe sef of sefs defined by fhe union of $(c) for 
each inpuf concepf c of each service musf be equal. This 
definifion formalises fhe idea of inpuf equivalence of two 
services of fhe composifion graph regarding fhe relafion 
befween fheir inpufs and fhe services fhaf mafch fhose 
inpufs. Thaf means fhaf fwo services Wi and wj of fhe 
graph are inpuf equivalenf if fhe services fhaf provide 
fhe inpufs of bofh services are fhe same. 

Definition 12: A service Wi — {Inwi,Outwi} G G is 
input-dominant {Inu,, >- Inyj.) wifh respecf fo a service 
Wj = {luyj ., Out,,,.} G G in fhe composifion graph G if: 

U {4’(g)}C U {$(c,)} (5) 

CiGlriw. CT^In-in- 

L in ^ j in^ 

Thus, informally, a service is inpuf-dominanf if if 
only needs a subsef of fhe informafion required by fhe 
dominafed service fo be invoked. For example, in Fig. 
wj is inpuf-dominanf respecf fo wq, since {{wi,u; 2 }} C 
{{wi,W2},{wi},{w3}}. 

Definition 13: Given a concepf in a composifion graph 
G {c G G), we denofe T'(c) as fhe funcfion fhaf refums a 
sef of inpuf concepfs in G fhaf are mafched by c, fhaf is, 
fhere exisfs an arc from c fo c' in G. 

vI/(c) = {c'|(c,c')GG} (6) 

Definition 14: A service Wi = {/riu,;,Oufu,.} G G is 
output-equivalent {Out„„ = Out„,f) respecf fo a service 
Wj = {luwj , Outwj} G G in fhe composifion graph G if: 

U vi,(c0 = U (7) 

Ci^Outm Cj^Outu, 

c in j inj 

Thaf is, fwo services are oufpuf-equivalenf if fheir 
oufpufs are mafched fo fhe same inpuf concepfs in fhe 
graph, which means fhaf fheir oufpufs can be consumed 
in fhe same way by fhe same services in G. 

Definition 15: A service Wi = {/riu,.,Oitfu,.} G G is 
output-dominant {Outw, >- Out„,.) respecf fo a service 
Wj - {lUy,^ , Outyj^} G G if. 

U T'(c0 D IJ T'(c,) (8) 

Ci^Outm CiGOutw- 

I j j 

Therefore, one service is output-dominant wifh respecf 
fo anofher service of fhe graph G if fheir oufpufs mafch 
fhe same inpufs of fhe same services in fhe composifion 
graph buf fhe dominanf service also provides addifional 
oufpufs fo fhe same or differenf services. 


Definition 16: a service Wi = {/riu,., Oufu,.} is interface- 
equivalent fo a service Wj = {Inwj,Outw-} {wi = Wj) if 
In„,, = In„,. and Outy,, = Outy,^, fhaf is, bofh are input- 
equivalent and output-equivalent. 

Definition 17: A service Wi interface-dominates a service 
Wj {wi P Wj) if fhe firsf dominafes fhe second in af leasf 
one aspecf (inpuf-dominanf or outpuf-dominanf) and is 
af leasf equivalenf in fhe ofher aspecf. Formally, Wi ^ Wj 
if {Iny,, >- Iriy,. A Outy,, A Outyj.) V {lUyj, = I Uy, - A 
Outy,, A Outy,.) V {iriy,, A I Uy, - A Outy,^ = Outy,.). 

This dominance definifion can be generalised fo in¬ 
clude more feafures, such as precondifions, effecfs, or 
non-funcfional properfies like QoS: 

Definition 18: A service wifh mulfiple properfies Wi = 
{Pf,,Pf„,---,Pw,} where P{y. are fhe inpufs, Pf,^ fhe 
oufpufs and fhe resf of paramefers are differenf proper¬ 
fies, dominafes anofher service Wj {wi ^ Wj) wifh param¬ 
efers Py,^ = {P^, ,..., if V fc G {1,..., n} P^^ A 

P^^. A3 fcG n},P(^^ A P^. 

The inferface dominance opfimisafion allows fo reduce 
fhe size of fhe composifion graph by subsfifufing fhe 
original services of fhe graph by absfracf inferfaces fhaf 
capfure fhe funcfionalify of fhe dominanf or equivalenf 
services. By minimising fhe graph size we improve fhe 
performance of fhe search algorifhms since fhey only 
explore a reduced search space. Once fhe search is per¬ 
formed and fhe opfimal composifion workflow is gener- 
afed, a posf-processing sfep can be used fo replace fhe 
absfracf service inferfaces wifh specific implemenfafions 
using fhe original dominanf / equivalenf services or by 
combinafions of dominafed services fhaf safisfy fhe same 
funcfionalify of fhe dominanf service. 

4.5 Optimal Composition Search 

The previous optimisations are intended to reduce the 
composition graph but keeping the same functionality. 
The next step is to perform a search over the graph 
to find the best composition among all the possible 
compositions that satisfy the input/output request. The 
search can be designed to optimise different criteria, such 
as the number of services, the execution path length or 
QoS properties. Typically, the search over the graph can 
be done forwards or backwards. In the first case, the 
composition starts from the inputs of the request (first 
layer), selecting invokable services until the goal outputs 
are obtained, whereas the second case starts with the 
goal outputs (last layer), selecting relevant services for 
the outputs until a composition that can be invoked with 
the initial inputs is found. 

Formally, the composition search can be modelled as 
a state-transition system, where the problem is divided 
into a set of states and transitions between states pS] . 
A state transition system is defined as a 3-tuple S = 
(S', A, 7 ), where: 

• S = {si, S 2 : ■ • • } is a finite set of states. 

• A = {oi, 02 ,... } is a finite set of actions. 

• 7 :SxA—>S isa state-transition function. 
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Using the concept of the state-transition system, the 
state space search problem can be defined as P = 
{E, So, G}, where Sq € S is fhe inifial sfafe and G C S' is 
a sef of goal sfafes. 

The sfafe-fransifion sysfem E allows fhe search fo 
navigafe fhrough fhe sef of sfafes applying differenf 
acfions, where each acfion may be associafed fo a cosf 
fhaf we wanf fo minimise. The sfafe represenfafion may 
vary depending on fhe sfrafegy used. Typically, in fhe 
case of fhe backward search, fhe sfafe will confain fhe 
informafion of fhe unsafisfied concepfs at each state, 
starting with the goal outputs. The goal then is to find a 
succession of acfions (oi, 02 ,..., a„) with the minimum 
cost that leads from fhe inifial sfafe, where unsafisfied 
concepfs = goal oufpufs, fo fhe goal sfafe, where un¬ 
safisfied concepfs = 0, fhaf is, fhere are no unsafisfied 
concepfs and fhe composifion is invokable. The available 
fransifions befween sfafes are given by fhe applicable 
acfions fo each sfafe, i.e., fhe outpuf relevanf services fhaf 
can be selecfed fo resolve all fhe unsafisfied concepfs. 

Given a composifion graph G = (V,E) as defined 
previously, where V = kk U G is fhe sef of verfices which 
are fhe services and fhe concepfs (inpufs/ oufpufs) of fhe 
graph, the state-transition system E for the (backward) 
composition problem is defined as follows: 

• S C 21*^1 where G is fhe sef of all concepfs in the 
composition graph, i.e., a state is a set of concepfs 
of fhe graph, s = {ci,..., c„}. 

• AC where W is fhe sef of services in fhe 

composifion graph, i.e., an acfion is a sef of services 
from fhe graph, a = {wi,..., w„}. 

. 7 ( 0 , s) = (s - U('^'(ci) I G e Out{a)) U In{a)), 
i.e., fhe applicafion of an acfion a = {ici,... ,Wn} 
fo a sfafe s = {ci,...,c„} generafes a new sfafe 
where all concepfs fhat are mafched by fhe oufpufs 
of fhe services of fhe acfions are removed, and fhe 
inpufs of fhe services of fhe acfions are added as 
fhe new unsafisfied concepfs. Funcfions In{a) and 
Out {a) refurn fhe union of fhe inpuf concepfs and 
fhe union of fhe outpuf concepfs of fhe services in 
a respectively 

The initial state sq of fhe backward composifion prob¬ 
lem P = (E, So, G) is defined as sq = In^of i-O-/ the inpuf 
concepfs of fhe oufpuf dummy service. For example, in 
Fig. fhe inifial sfafe is sq = {fi 8 ,fi 9 }- On fhe ofher 
hand, fhere is jusf one goal sfafe G = {sg = 0}, i.e., 
fhe goal sfafe is reached when fhere are no unsafisfied 
concepfs in fhe composifion. 

The efficiency of fhe search can also be improved using 
search optimisations depending on fhe search sfrafegy 
followed. These optimisations can be applied fo fhe 
available acfions for each sfafe by pruning acfions fhat 
lead to dead-ends, actions that are equivalent, or actions 
that are dominated (carmot lead to a better solution). 

5 Reference Implementation 

We developed a reference implemenfafion of fhe infe- 
grafed graph-based composifion framework fhaf is based 


on fwo main componenfs: iServe Q, a service warehouse 
wifh advanced discovery supporf which provides fhe 
service regisfry and fakes care of fhe mafchmaking and 
service discovery activities, and ComposIT |22| , which 
is in charge of fhe graph-based composifion parf. 

Fig. 1^ depicfs fhe archifecfure of fhe sysfem. In a 
nufshell fhe composifion process is carried ouf as fol¬ 
lows. When a composifion requesf is senf fo fhe sysfem 
fhrough fhe Web UI, ComposIT sfarfs computing fhe 
composifion graph wifh all fhe relevanf services for 
fhe requesf. To fhis end, all fhe relevanf services are 
discovered layer by layer using fhe fine-grained I/O 
logic-based discovery supporf provided by fhe Semanfic 
Discovery Engine of iServe. This engine relies on the 
Service Manager and the KB Manager to retrieve the 
relevant services using semantic reasoning capabilities. 
During the composition graph generation, ComposIT 
also makes intensive use of fhe KB Manager in order 
fo carry ouf concepf level mafching and consequenfly 
figure ouf how fhe inpufs and oufpufs of fhe services 
obfained can be cormecfed. Once fhe composifion graph 
is generafed, ComposIT applies fhe backward pruning 
and fhe interface dominance optimisations fo reduce fhe 
graph size. These optimisations are applicable using only 
fhe information contained in the graph, and thus there 
is no need to interact with the discovery component. 
Finally, an optimal search is performed over fhe graph 
using a backward algorifhm fhaf exfracfs fhe opfimal 
composifion from fhe graph. 

In fhe nexf secfions we shall cover in more defail fhe 
inner workings of iServe and ComposIT respectively 

5.1 iServe 

iServe |j^, see righf hand-side of Fig. is a service 
warehouse whose funcfionalify includes fhe core service 
regisfry anchored on Linked Dafa principles, seman¬ 
fic reasoning supporf, advanced discovery funcfional¬ 
ify, and furfher analysis componenfs able fo assisf in 
aufomafically locating and generating semanfic service 
descripfions ouf of Web resources. For fhe purposes of 
fhis work we have essenfially exploifed fhe regisfry and 
discovery funcfionalify. 

The service discovery funcfionalify builds on fop of 
fhe Storage Access Layer, which is in charge of managing 
fhe regisfry's dafa fhaf includes Service descripfions, re- 
lafed documenfs and fhe corresponding Onfologies. This 
layer essenfially provides a RDF/S and OWL sforage 
and reasoning supporf, documenf sforage, as well as 
basic crawling facilifies fo aufomafically obfain refer¬ 
enced Onfologies. RDF/S and OWL sforage and rea¬ 
soning supporf is delegafed fo dedicated engines which 
are accessed by means of fhe SPARQL 1.1 sfandard. 
Therefore, fhe reasoning capabilities depend largely on 
fhe acfual configuration of fhe store. Concretely, fhe 
discovery infrasfrucfure confacfs fhe Service Manager 
fo lisf services given basic criferia such as fhe inpuf 
and oufpuf fypes provided, and fhe KB Manager fo 
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Fig. 5. ComposIT / iServe architecture 


obtain concepts, properties, and their sub or super con¬ 
cepts. Depending on their implementation Service and 
KB Managers combine internal indexes with SPARQL 
queries issued to the triple store by means of Jena. 

Services are imported to iServe using a range of frans- 
formation engines able fo import service descriptions in 
a variety of formalisms including SAWSDL, WSMO-Life, 
OWL-S, and MicroWSMO. These plugins generate de¬ 
scriptions expressed in terms of a simple RDF/S model. 
Minimal Service Model (MSM) Q, which essentially 
captures the intersection of exisfing service descripfion 
formalisms. By means of fhese fransformafions iServe 
provides an homogeneous descripfion for services fhaf 
were orginally annofafed using heferogeneous means. 

Given fhaf, as we saw in Section ^ fhe response 
fime of fhe overall composition is highly dependent on 
the performance of fhe service discovery and concepf 
mafchmaking fasks, we exfended iServe wifh various 
implemenfafions of fhe Service and Knowledge Base 
Managers. We fesfed differenf configurations fo sfudy 
fheir individual performance and fhe overall impacf on 
composition response times. In particular, we used fhe 
following configurations: 

1) SPARQL D/M: pure SPARQL Discovery / Mafch¬ 
making where all interactions with the Service 
and Knowledge Base managers are directly im¬ 
plemented as SPARQL queries. This is the t 5 rpical 
approach of discovery engines and was fhe original 
implemenfafion of iServe. 

2) Index. D/SPARQL+Cache M: I/O service discovery 
is based on an index. We additionally used herein 
an infermediafe cache af fhe level of fhe con¬ 
cepf mafcher in order fo avoid issuing recurrenf 
SPARQL queries. 

3) Full Indexed D/M: bofh service discovery and con¬ 
cepf mafchmaking relied on local indexes pre- 
populafed af load time (and updated with writes). 
In this configuration, service discovery and concept 
matchmaking do not need to issue any SPARQL 
query to the backed. 


5.2 ComposIT 

ComposIT [ [22) , depicted in the left hand-side of Fig. [^ is 
fhe semantic Web service composifion engine we rely on. 
If implemenfs all fhe differenf graph-based composifion 
phases of fhe framework described in Sec. [^ The se¬ 
mantic service discovery and mafchmaking mechanisms, 
which originally were direcfly implemenfed infernally, 
are delegafed fo iServe by means of integration adapters 
implemented for fhe purposes of fhis work. ComposIT 
nonefheless uses an infernal cache and an index fo 
efficienfly recover fhe information of fhe generafed com¬ 
posifion graph. If is worfh fo nofe fhaf fhe archifecfure 
supporfs fhe deploymenf of multiple, disfribufed iServe 
insfances fo provide differenf endpoinfs fhaf can be used 
by ComposIT in fhe composifion phase by aggregating 
fhe resulfs of fhe regisfries af fhe ComposIT API level. 
Indeed, since fhe services fo confemplafe at composition 
time are identified by fhe remofe regisfry and we jusf 
use fhem direcfly, composing fhis sef of services ouf of 
jusf one API call or several calls in parallel (one per 
regisfry) is a frivial change. The overall response fime 
analysis would still remain unchanged, and would have 
an upper-bound defermined by fhe slowesf regisfry. This 
also applies fo ofher fhird-parfy discovery engines as 
long as fhey supporf fine-grained I/O discovery queries 
as described in Sec. 4.2 The infegrafion of fhese fhird- 
parfy regisfries could be achieved by developing infer- 
face adapfers (wifh capabilifies fo refrieve inpuf and 
oufpuf relevant services) which could be plugged in to 
the system, keeping the generation of fhe composifion 
graph isolafed from fhe concrefe regisfries used. 

The generafed composifion graph can confain differenf 
composifions wifh fhe same or different length (number 
of layers) and with different number of services depend¬ 
ing on fhe services fhaf have been selecfed fo generafe 
fhe needed dafa. Among fhe differenf combinations fhat 
can be obfained, fhe goal of ComposIT is fo find fhe 
shorfesf service composifion wifh fhe minimum number 
of services. For fhis purpose, ComposIT searches for fhe 
opfimal composifion by carrying ouf a heurisfic search 
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based on the algorithm p^ . This search was imple¬ 
mented using Hipster4j p7) to identify a minimal subset 
of fhe services from fhe graph fhaf satisfy fhe requesf 
(in ferms of inpufs and oufpufs). Nofe fhaf mulfiple 
compositions can be exfracfed from fhe composifion 
graph since fhere may be differenf services fhaf generafe 
oufpufs of fhe same concepf. 

6 Evaluation 

In fhis section we presenf a quanfifafive evaluation 
of our approach. The purposes of fhe evaluafion are: 
1 ) measure fhe scalabilify of fhe approach wifh many 
services; 2) study the impact of fhe discovery on fhe 
overall composifion performance and 3) compare fhe 
performance wifh differenf opfimisafions. 

In order fo perform a sfandard and comparable eval¬ 
uafion, we selecfed fhe Web Service Challenge 2008 
(WSC'08) service dafasefs. These dafasefs allow us fo 
measure fhe scalabilify wifh an increasingly large sef 
of services (from 158 fo 8,119 services). Services were 
imporfed fo iServe using an specific fransformer plugin 
which franslafes each service descripfion in fhe WSC'08 
XML formaf info MSM, and fhe XML concepf faxon- 
omy info an equivalent OWL representation. iServe is 
responsible of idenfifying, loading and reasoning wifh 
fhe ontologies used in fhe service descriptions. Data 
types of fhe inpuf and oufpufs of service descriptions are 
linked fo fheir corresponding semantic concepfs fhrough 
fhe model Reference properfy of fhe MSM, which poinfs 
fo fhe concepfs defined in fhe fransformed OWL model. 

Experimenfs were run under Ubunfu 10.04 64-bif on 
a PC wifh an Infel Core 2 Duo E6550 af 2.33GHz and 4 
GB of RAM. OWLIM-Life 5.3 wifh OWL Horsf reasoning 
was chosen in iServe as fhe RDE friple store for the 
semantic registries and deployed within Tomcat 7. 

TABLE 1 

Characteristics of the WSC’08 datasets. 


Dataset 

#Serv. 

#Con. 

#Serv.Sol. 

Length 

WSC'08 01 

158 

1,540 

10 

3 

WSC'08 02 

558 

1,565 

5 

3 

WSC'08 03 

604 

3,089 

40 

23 

WSC'08 04 

1,041 

3,135 

10 

5 

WSC'08 05 

1,090 

3,067 

20 

8 

WSC'08 06 

2,198 

12,468 

40 

9 

WSC'08 07 

4,113 

3,075 

20 

12 

WSC'08 08 

8,119 

12,337 

30 

20 


Table U shows the characteristics of each WSG'08 
dafasef. The number of services and concepfs in fhe 
ontology of each dafasef are shown in columns #Serv. 
and #Con. respectively. The qualify of fhe solufions is 
based on the number of services and fhe lengfh (i.e., 
number of layers) of fhe composifion. The optimal qual¬ 
ify of solufion for each dafasef (according fo fhe WSG'08 
compefifion) are shown in columns #Serv.Sol. and Length. 

Experimenfafion was done using fhe configurations 
explained in Sec. wifh one insfance of iServe in order fo 


measure fhe effecf of fhe Discovery/Mafchmaking over 
fhe whole composifion process. Resulfs wifh each config¬ 
uration are shown in Table |2] The second column shows 
fhe size (number of services) of fhe resulfing composifion 
graph for each dafasef. The nexf columns show fhe fime 
faken fo generafe fhe composifion graph (G. time) in 
seconds and fhe number of SPARQL queries generated 
during thaf process. The lasf fhree columns show the 
size of fhe graph after fhe graph-based opfimisafions, fhe 
fime of fhe composifion search (graph opfimisations + 
opfimal A”^ backward search) and fhe number of services 
and lengfh of fhe opfimal composifion found. Nofe fhat 
fhe backward opfimal search does nof depend on fhe 
configurafion selecfed since if only uses fhe informafion 
in fhe composifion graph. 



WSC'08-01 WSC'08-03 WSC'08T)5 WSC’08-07 

Full indexed Disco\«ry/Matchmaking (D/M) ra Search 


Fig. 6. Graph generation time vs Search time for the Fuii 
Indexed Discovery/Matchmaking configuration. 

The analysis of fhese resulfs reveals fhaf fhe discovery 
and mafchmaking phases fake mosf of fhe fime of fhe 
composifion, even using fhe opfimal configurafion {Full 
Indexed D/M) fo avoid fhe latency of fhe SPARQL queries. 
This is graphically represented in Fig. This figure 
shows fhe overall composifion fime for each dafasef 
including fhe relafive fime of fhe Full Indexed D/M 
(blue bar) and fhe Composition Search (red bar). The Full 
Indexed D/M fakes 77% of fhe fofal composifion fime 
on average. This percenfage is even higher (abouf 99%) 
if the discovery and matchmaking are not optimised 
using indexes and cache. In other words, as anticipated 
by the complexity analysis presented earlier, discovery 
and matchmaking are responsible for fhe majorify of 
fhe compufafion fhaf needs fo be performed fo compose 
services. Opfimising bofh phases is fhus fundamenfal. 

The comparison of fhe scalabilify of fhe fhree con¬ 
figurations wifh respecf fo fhe number of services is 
shown in Fig. As can be seen, direcfly querying fhe 
backend (see SPARQL D/M), which is fhe approach 
followed by mosf discovery engines, rapidly becomes 
prohibitively slow taking 1,656 seconds (i.e., 27.6 min) in 
the largest dataset. Indeed, the generation of fhe compo¬ 
sifion graph requires compufing every semantic match 
between all inputs and outputs as well as discovering 
relevant services at each layer. Doing so leads to issuing 
thousands of SPARQL queries. This can be dramafically 
improved using a discovery index and a local cache for 
fhe mafchmaking system as can be seen in fhe second 
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TABLE 2 

Evaluation results with different Discovery/Matchmaking (D/M) configurations with the WSC’08 datasets 



Discovery/Matchmaking (D/M) 

Composition 


1) SPARQL D/M 

2) Index. D/SPARQL-i-Cache M 

3) Full Indexed D/M 


Dataset 

G. size 

G. time (s) 

#SPARQL 

G. time (s) 

#SPARQL 

G. time (s) 

#SPARQL 

G. size (opt) 

Comp, time (s) 

Sol. (serv./length) 

WSC'08-01 

35 

28.52 

3256 

5.67 

624 

0,18 

0 

13 (-37%) 

0.08 

10/5 

WSC'08-02 

35 

63.30 

7349 

11.76 

1830 

0,38 

0 

13 (-37%) 

0.07 

5/3 

WSC'08-03 

105 

262.80 

36619 

20.05 

3184 

0.69 

0 

40 (-38%) 

0.21 

40/23 

WSC'08-04 

44 

136.20 

13828 

21.12 

3481 

0.60 

0 

25 (-57%) 

0.12 

10/5 

WSC'08-05 

97 

333.60 

41148 

26.05 

4417 

0.74 

0 

52 (-54%) 

0.18 

20/8 

WSC'08-06 

189 

1051.20 

93682 

48.21 

8511 

1.12 

0 

75 (-40%) 

1.05 

42/7 

WSC'08-07 

124 

1183.20 

120881 

35.76 

6376 

1.33 

0 

70 (-56%) 

0.23 

20/12 

WSC'08-08 

121 

1656.00 

89518 

78.00 

15844 

1.48 

0 

58 (-48%) 

0.34 

30/20 


10,000.00 



T 

0.10 

0 1,000 2,000 3,000 4,000 5,000 6,000 7,000 8,000 

Numberofservices in the dataset 

■ SPARQL D/M — ♦ — Index. D / SPARQL+Cache M - - ▼ - - Full Indexed D/M 

Fig. 7. Composition time for different configurations. 


configuration. In this case, almost every composition is 
calculated in less than a minute. The generated SPARQL 
queries in this case are reduced by up to 91% (for the 
WSC'08-3 dataset) leading to a significant performance 
improvement. Although such an improvement can be 
enough to solve the smaller datasets in a few seconds, 
the latency of the SPARQL queries still remains a bot¬ 
tleneck for bigger datasets like the WSC'08-08 dataset 
that still require evaluating 15,844 SPARQL queries for 
generating the composition graph in 78 seconds. Our 
tests show, however, that the full indexed configuration 
allows solving the largest problems very fast by avoiding 
the evaluation of SPARQL queries at composition time. 
This configuration entails the derived need for service 
registries to additionally calculate and maintain the in¬ 
dexes. Doing so, nonetheless, enables performing very 
efficient composition over remote 3rd party controlled 
service registries akin to what can be obtained by the 
fastest composition engines in the unrealistic scenarios 
where all services are available and pre-loaded in mem¬ 
ory. Additionally, indeed, using those indexes allows 
service registries to provide highly efficient discovery for 
a controlled set of queries, while retaining the ability to 
offer fully flexible yet less efficient discovery support. 

We have also evaluated our framework with the 
WSC'09-10 datasets. Results show a similar scalability 


behaviour with the number of services for each con¬ 
figuration. Moreover, our approach is able to solve all 
the datasets with optimal results, which are shown at 
https://wiki.citius.usc.es/composit:wsc09. 

7 Conclusions 

In this paper we have presented a theoretical analysis of 
service composition in terms of its dependency with ser¬ 
vice discovery. Driven by this analysis we have defined 
a formal integrated graph-based composition framework 
anchored on the integration of service discovery and 
matchmaking within the composition process. We have 
devised a reference implementation of this framework 
on the basis of two pre-existing separate components, 
namely iServe and ComposIT. This reference implemen¬ 
tation has been used to empirically study the impact of 
discovery and matchmaking on service composition, and 
we have provided three different configurations with 
varying performance. Our empirical analysis shows that, 
indeed, typical approaches followed by discovery en¬ 
gines cannot serve as a suitable basis to support efficient 
service composition as they lead to prohibitive execu¬ 
tion times. We have also shown, though, that with the 
adequate interface granularity and indexing, discovery 
engines can support highly efficient composition akin to 
that obtained by the fastest composition engines without 
having to assume to local availability and in-memory 
preloading of service registries. 

This work proves the scalability and flexibility of 
our proposal and provides insights on how integrated 
composition systems can be designed in order to achieve 
good performance in real scenarios, where service reg¬ 
istries and composition frameworks are likely to be 
distributed and controlled by diverse organisations. 
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